Creating software for environmental measurements based on GPS data - twsagps package report

Author

Igor Graczykowski

Introduction

This report comprehends the present state of twsagps package - current issues and further development plans. Package consists of four functions PO_exposure(), KDE_exposure(), DR_exposure() and LS_exposure() each utilizing one Time Weighted Spatial Averaging (TWSA) method. The TWSA methods are taken from the reference paper (Jankowska et al. 2023) on which the package is based upon - Point Overlay (PO), Kernel Density Estimation (KDE), Density Ranking (DR) with the addition of Line Segment (LS). Report demonstrates code workflow and outcomes.

For this demo following packages were used:

library(terra)
library(sf)
library(tmap) 
library(dplyr)
library(tmaptools)
library(twsagps)

Research area and sample data

For presenting package functioning Geolife GPS Trajectories 1.3 Data (Zheng et al. 2009), (Zheng et al. 2008), (Yu Zheng 2010) was used as sample GPS data. The Geolife dataset consists of significant volumes of time-stamped GPS data retrieved from 182 pariticipants in over five year time span. Research area was limited to San Diego county and GPS data was restricted to the research area. To improve running time of functions Geolife data was reduced to each 10th record. The GPS data was collected between 17th August 2011 and 20th August 2011.

geol_file = read.csv("geolife_san_diego.csv")
geolife_days = geol_file |> select(-X) |> 
  slice(which(row_number() %% 10 == 1))
geolife_time = geolife_days |>
  mutate(dateTime = paste(date, time), 
         dateTime = as.POSIXct(dateTime, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"))
Geolife records with dateTime
lat lon date time dateTime
33.26486 -117.4323 2011-08-17 21:38:11 2011-08-17 21:38:11
33.26239 -117.4301 2011-08-17 21:38:21 2011-08-17 21:38:21
33.25969 -117.4280 2011-08-17 21:38:31 2011-08-17 21:38:31
33.25700 -117.4259 2011-08-17 21:38:41 2011-08-17 21:38:41
33.25437 -117.4238 2011-08-17 21:38:51 2011-08-17 21:38:51
33.25162 -117.4216 2011-08-17 21:39:01 2011-08-17 21:39:01

NDVI raster data was applied as sample environmental layer. Indicator was calculated from a Landsat 8 image involving San Diego county which was captured on 22nd Februrary 2024.

landsat_red = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B4.TIF")
landsat_nir = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B5.TIF")
landsat_ndvi = (landsat_nir - landsat_red) / (landsat_nir + landsat_red)
landsat_red = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B4.TIF")
landsat_nir = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B5.TIF")
landsat_ndvi = (landsat_nir - landsat_red) / (landsat_nir + landsat_red)

NDVI data (credits - U.S. Geological Survey) with Geolife San Diego data

Methods

Point Overlay (PO)

Point Overlay method is implemented in PO_exposure() function. Evaluated by converting GPS point vector data to raster data type using sum function. …

Kernel Density Estimation (KDE)

KDE_exposure() utilizes the KDE method. Method generates smoothed buffers aroud GPS data considering differences in spatial distribution and influence. Function calculates exposure from GPS points coordinates on matrix of coordinates using kde() from TDA package. …

Density Ranking (DR)

Density Ranking method in DR_exposure() is based on DR function from yenchic/density_ranking GitHub repository[Chen and Dobra (2020)](Chen 2019) and implemented as DR_simple() adapted to processing data in projected coordinates systems. DR serves as a improvement to KDE method by ranking KDE values (Jankowska et al. 2023). …

Line Segment (LS)

New addition in TWSA approaches is a Line Segment method applied in LS_exposure() drawn from michaeldgarber/microclim-static-v-dynam GitHub repository. Code applied in LS_exposure() consists of modified functions from the repository’s demo. Line Segment puts emphasis on time component of GPS data. Firstly, trajectories following GPS movement and buffers around each trajectory segment are created. Secondly, weighted values for each segment are extracted from environmental layer and combined with time elapsed between every two GPS records. Obtained polygons are converted to raster with sum function afterwards.

Processing

Parameters

User may provide various arguments to functions in twsagps package

  • x - data frame of GPS data with lon/lat columns

  • cellsize - size of cells in output raster

  • normalize - if TRUE activity space raster values will be rescaled to 0-1 range

  • bandwidth - bandwidth (DR/KDE) size of buffer in meters (LS)

  • time_data - column name containing time data of GPS data (LS)

  • time_unit - unit of time weights (LS)

  • stats - output statistics calculated on output raster

  • start_crs - coordinate reference system of GPS data

  • end_crs - coordinate reference system of output layer

  • env_data - environmental data raster

x = geolife_time
cellsize = 50
normalize = TRUE
bandwidth = 200 # in meters
time_data = "dateTime"
time_unit = "mins"
stats = c("count", "area", "min", "max", "range", "mean", "std", 'sum')
start_crs = "WGS84"
end_crs = crs(landsat_ndvi)
env_data = landsat_ndvi


# KDE and DR bandwidths discussed in Package Issues
bandwidth_kde = 25
bandwidth_dr = 70

Entry processing

Entry processing of GPS data and environmental data shared by all functions. Transforming GPS data to spatial data object and projecting GPS data and environemtal raster to common CRS.

 # get spatial data

geol_points = terra::vect(x = geolife_time, geom = c("lon", "lat"), crs = start_crs)

# crs
if (!is.null(end_crs)){
  geol_proj = terra::project(geol_points, terra::crs(end_crs))
} else if (!is.null(env_data)) {
  geol_proj = terra::project(geol_points, terra::crs(env_data))
} else {
  geol_proj = geol_points
}

if (!is.null(env_data)){ # change env_data crs beforehand
  env_data_proj = terra::project(env_data, terra::crs(geol_proj))
}

Extent

Calculating extented extent for DR/KDE/LS methods for the purpose of preventing cropping cells with values. DR and KDE extents are expanded by constant 5% of former boundary box. Preceding computing of new extent in LS trajectories and buffers are processed. Subsequantly, the extent is based on buffer vector layer boundary box.

### DR/KDE ###

buff_const = 0.05 #percent of extent as bbox

# get extent
geol_extent = terra::ext(geol_proj)

# buffer around extent
x_const_dr = (terra::xmax(geol_extent) - terra::xmin(geol_extent)) * buff_const
y_const_dr = (terra::ymax(geol_extent) - terra::ymin(geol_extent)) * buff_const

# new extent
new_extent = c(terra::xmin(geol_extent) - x_const_dr,
               terra::xmax(geol_extent) + x_const_dr,
               terra::ymin(geol_extent) - y_const_dr,
               terra::ymax(geol_extent) + y_const_dr)
Code
### LS function ###

trajectories_fun = function(data){
  trajectories_out = data |>
    sf::st_as_sf() |> # change to sf object to change points to linestring later
    #filter to the study id defined by the argument
    #define start and end points of line
    dplyr::mutate(
      line_id = dplyr::row_number(),#an id for each "line segment"
      x_start= sf::st_coordinates(geometry)[,1],
      y_start= sf::st_coordinates(geometry)[,2],
      x_end = dplyr::lead(x_start),
      y_end = dplyr::lead(y_start)
    ) |>
    dplyr::ungroup() |>
    sf::st_set_geometry(NULL) |>
    #exclude the last observation, which has no "lead", and will be missing.
    dplyr::filter(is.na(x_end)==FALSE) |>
    # Make the data long form so that each point has two observations
    tidyr::pivot_longer(
      # select variables to pivot longer.
      cols = c(x_start, y_start, x_end, y_end),
      #value goes to "x/y", and time goes to "_start/end"
      names_to = c(".value", "time"),
      names_repair = "unique",
      names_sep = "_"#the separator for the column name
    ) |>
    # create sf object once again
    sf::st_as_sf(coords = c("x", "y"), crs= sf::st_crs(data)) |>
    dplyr::group_by(line_id) |>
    dplyr::summarize(do_union = FALSE) |>
    sf::st_cast("LINESTRING") |> # cast linestring type
    sf::st_as_sf() |>
    dplyr::ungroup()
  return(trajectories_out)
}
### LS ###

# create line segments from spatial points
trajectories = terra::vect(trajectories_fun(geol_proj))

# create buffers over line segments
traj_buff = trajectories |>
  tidyterra::select(line_id) |>
  terra::buffer(bandwidth) # takes a lot of time

#
extent_buff = terra::ext(traj_buff)
tmap mode set to interactive viewing

Line Segment buffers with trajectories

Grid Raster

Compute Grid Raster utilized in calculating output raster. Grid consists of previously specified extent, cellsize argument or environmental data raster.

## PO
extent = terra::ext(geol_proj)
## KDE/DR
# extent = new_extent
## LS
# extent = extent_buff

# implement cellsize > 0 condition
if (is.numeric(cellsize)) { # cellsize included

  if (terra::linearUnits(geol_proj) == 0){ # crs units in degrees
    
    dist_lon = geosphere::distm(c(extent[1], extent[3]), c(extent[2], extent[3]),
                                fun = geosphere::distHaversine)
    dist_lat = geosphere::distm(c(extent[1], extent[3]), c(extent[1], extent[4]),
                                fun = geosphere::distHaversine)
    # number of cells
    x_cells = (dist_lon / cellsize) |> as.integer()
    y_cells = (dist_lat / cellsize) |> as.integer()

    # empty_rast for units in degrees
    grid_rast = terra::rast(crs = terra::crs(geol_proj), nrows=y_cells,
                            ncols=x_cells, extent = extent)

  } else {
    grid_rast = terra::rast(crs = terra::crs(geol_proj), extent = extent,
                             resolution = cellsize)
  }
  

} else if (!is.null(env_data)){ #if incorrect cellsize and env_data exists

  grid_rast = env_data_proj
  terra::ext(grid_rast) = extent

}

Activity Space

Point Overlay

Activity space of Point Overlay method is evaluated by converting points vector data to raster using sum function.

geol_proj$rast = 1

rast_points = terra::rasterize(geol_proj, grid_rast, field = "rast", fun = sum)

Kernel Density Eestimation/Density Ranking

KDE and DR functions implemented in KDE_exposure() and DR_exposure() require data frames of GPS points coordinates and coordinates of every cell of Grid Raster (KDE) or properties of the Grid Raster (DR) as inputs.

### KDE ###

# number of cells
x_cells = terra::ncol(grid_rast)
y_cells = terra::nrow(grid_rast)

# coordinate seq
x_seq = seq(new_extent[1], new_extent[2], length.out = x_cells)
y_seq = seq(new_extent[3], new_extent[4], length.out = y_cells)

# coords of every cell
expand_grid = expand.grid(x_seq,y_seq)

# point coords
coords = terra::geom(geol_proj)[,3:4]

# kde functions from TDA package
kde_data = TDA::kde(coords, Grid = expand_grid, h = sqrt(bandwidth_kde))

kde_rast = grid_rast
# insert kde values to raster
terra::values(kde_rast) = kde_data

kde_rast = terra::flip(kde_rast, direction='vertical') # flip raster
### DR ###

# point coords
coords = terra::geom(geol_proj)[,3:4]

# number of cells

x_cells = terra::ncol(grid_rast)
y_cells = terra::nrow(grid_rast)



# calculate DR data
dr_data = DR_simple(coords, kernel= "Gaussian", h = bandwidth_dr,
                    x_res=x_cells, y_res=y_cells, xlim=new_extent[1:2],
                    ylim=new_extent[3:4])

dr_rast = grid_rast
# insert dr values to raster
terra::values(dr_rast) = dr_data$gr_alpha

dr_rast = terra::flip(dr_rast, direction='vertical') # flip raster

Activity Space Rasters of PO, KDE and DR methods

Normalization

Rescale activity space raster’s values to 0-1 range.

### PO ###

fin_rast = rast_points

if (normalize == TRUE){

  fin_rast = terra::subst(fin_rast, from = NA, to = 0) # proper range for normalization
  rast_minmax = terra::minmax(fin_rast) # minmax
  # calculate normalization to 0-1 range
  fin_rast = (rast_points - rast_minmax[1,])/(rast_minmax[2,]-rast_minmax[1,])
  fin_rast = terra::subst(fin_rast, from = 0, to = NA) # insert NA

}
### KDE/DR ###

## KDE
fin_rast = kde_rast
## DR
# fin_rast = dr_rast


if (normalize == TRUE){ # normalize values

  rast_minmax = terra::minmax(fin_rast) # minmax
  # calculate normalization to 0-1 range
  fin_rast = (fin_rast - rast_minmax[1,])/(rast_minmax[2,]-rast_minmax[1,])
}

fin_rast = terra::subst(fin_rast, from = 0, to = NA)

Normalized Activity Space for PO, KDE and DR methods

Environmental exposure

Point Overlay/Kernel Density Estimation/Density Ranking

Environmental exposure layer is computed by multiplying activity space layer and environmental data layer.

if (!is.null(env_data)){ # calculate exposure
  env_data_resamp = terra::resample(env_data_proj, fin_rast)
  rast_env = fin_rast * env_data_resamp
}

Line Segment

Computing environmental exposure layer in Line Segment method requires assigning environmental data raster values to Grid Raster. Creating activity space demands allocating all Grid Raster values to 1.

if (!is.null(env_data)){ # env_data included
  # replace vals of grid with env values
  env_resamp = terra::resample(env_data_proj, grid_rast)
  terra::values(grid_rast) = terra::values(env_resamp)
} else {
  terra::values(grid_rast) = 1
}

Consecutively, Grid Raster values are extracted to trajectory buffer and later on summarized by area overlap for each segment. Time elapsed between every two GPS data points are calculated in order to take cognistance of time weights. Finally, vector data is converted to raster based on exposure and time weights with sum function.

# extract values and weights (area overlap) from grid for each cell of buffer
traj_extract= grid_rast |>
  terra::extract(traj_buff, na.rm=TRUE, weights = TRUE) |>
  dplyr::as_tibble() |>
  dplyr::rename(
    line_id = ID,#rename this to line id
    e=2#second column is the exposure.
  )
  
# calculate summarized exposure for each buffer
traj_extract_line_id = traj_extract |>
  dplyr::group_by(line_id) |>
  dplyr::summarize(
    #Jan 9, 2024 use R's built-in weighted.mean() function
    #instead of calculating weighted average manually
    e=stats::weighted.mean(
      x=e,
      w=weight,
      na.rm=T),
    #These weights are based on the areal overlap, not time
    sum_weights = sum(weight,na.rm=T),
    n_pixel = dplyr::n() # number of observations corresponds to number of pixels per line segment
  ) |>
  dplyr::ungroup()

# check if time_data column provided


if (!(is.null(geol_proj[time_data]))){
  
  duration_line_id = geolife_time |>
    dplyr::mutate(time_elapsed = as.numeric(difftime(dplyr::lead(.data[[time_data]]),
                                                     .data[[time_data]],
                                                     units = time_unit)),
                  line_id = dplyr::row_number()) |>
    dplyr::select(line_id, time_elapsed)
  
  traj_extract_line_id = traj_extract_line_id |>
    #now link in the time weight
    dplyr::left_join(duration_line_id,by=c("line_id")) |>
    dplyr::mutate(
      #calculate a weight that considers both area overlap and time
      end_weights = e * time_elapsed
    )
  } else {
    traj_extract_line_id = traj_extract_line_id |>
      dplyr::mutate(end_weights = e) # end result is exposure without time
  }

# join weights with spatial buffer
weight_buff = dplyr::inner_join(traj_buff, traj_extract_line_id, by = 'line_id')
  
# rasterize results
fin_rast = terra::rasterize(weight_buff, grid_rast, field = "end_weights", fun = sum)
traj_extract - Extracted values for each raster cell
line_id e weight
1 0.1510287 0.2776131
1 0.3074365 0.8561898
1 0.2196987 1.0000000
1 0.2476736 1.0000000
1 0.2691458 0.8558192
1 0.2917254 0.2783543
1 0.0974029 0.2394366
1 0.0545786 0.9831357
1 0.1115811 1.0000000
1 0.2017791 1.0000000
traj_extract_line_id - Extracted values for each line segment with weights
line_id e sum_weights n_pixel time_elapsed end_weights
1 0.1818616 104.5206 127 0.1666667 0.0303103
2 0.1825105 107.2934 132 0.1666667 0.0304184
3 0.1844809 107.2161 135 0.1666667 0.0307468
4 0.1678029 105.7779 135 0.1666667 0.0279672
5 0.1756641 109.1288 137 0.1666667 0.0292774
6 0.2133046 111.2917 140 0.1666667 0.0355508
7 0.1915251 109.1753 137 0.1666667 0.0319208
8 0.1610424 110.9446 137 0.1666667 0.0268404
9 0.1730918 110.1436 137 0.1666667 0.0288486
10 0.1926430 103.0510 127 0.1666667 0.0321072

Weighted line segments buffers

Raster statistics

Statistics are calculated utilizing terra::global() function.

Code
# Output raster and statistics calculation

output_calc = function(rast, stats = NULL){

  output = list()
  output$rast = rast

  if (!is.null(stats)){ # activity stats
    raster_stats = rast_stats(rast, stats)
    output$stats = raster_stats
  }
  return(output)
}

# raster statistics
rast_stats = function(raster, stats){
  vals = data.frame(matrix(ncol = length(stats)))
  colnames(vals) = stats
  for (statistic in stats){
    if (statistic == "range"){
      range = terra::global(raster, fun = statistic, na.rm = TRUE)
      value = range[,2] - range[,1]
    } else if (statistic == "count"){
      value = raster |> terra::freq() |> dplyr::summarise(n = sum(count)) |>
        as.integer()
    } else if (statistic == "area") {
      value = raster |> terra::expanse() |> dplyr::select(area) |> as.integer()
    } else {
      value = terra::global(raster, fun = statistic, na.rm = TRUE) |> as.numeric()
    }
    vals[statistic] = value

  }
  return(vals)
}
if (!is.null(env_data)){  # env_data output 
  output = output_calc(rast_env, stats = stats)
} else {
  # calculate activity output
  output = output_calc(fin_rast, stats = stats)

}

Output

Output of twsagps package functions is a list consisting of enviromental exposure raster and statistics data frame.

Environmental exposure layers for all methods

Outputs statistics for all TWSA methods
Method count area min max range mean std sum
PO 1089 2724647 -0.0162026 0.0424337 0.0586363 0.0042731 0.0043587 4.653353
KDE 18551 46413985 -0.0017587 0.0537845 0.0555432 0.0001080 0.0013325 2.003008
DR 4517 11301403 -0.0572428 0.1680100 0.2252528 0.0200316 0.0203706 90.482557
LS 20904 52301083 -0.0810245 527.6466675 527.7276920 2.5270283 26.2162668 52824.999407

Package issues

Terra package

R is unable to save terra spatial objects as variables in environment. Only way to use terra objects after reopeing the R session is to use terra::wrap() and terra::unwrap() functions.

output_wrap = wrap(output$rast)
output$rast = unwrap(output_wrap)

Several functions from the package are time-consuming and ability to reuse them in R session would be beneficial. Should we consider converting to other package or entrust the matter of saving variables to the user?

Line Segment normalization

Due to different workflow of Line Segment method (environmental exposure layer created by extracting cell values from grid raster [ Figure 1] , not like in other methods by simply multiplying activity raster by environmental layer) applying activity raster normalization is problematic and cannot be accomplished solely by rescaling activity raster values.

Figure 1: Line Segment method workflow

Implementing normalization to end weights attribute is pointless as rasterization with sum function exceeds values out of 0-1 range. Additionally, normalized activity space raster has no practical value as exposure is calculated before rasterization. Is applying normalization to Line Segment method viable without altering the method itself?

KDE/DR smoothing parameter - TDA package

DR and KDE methods utilize kde() R function from TDA package which’s bandwidth is a smoothing parameter. Bandwidth should correspond to kernel search radius in meters but in this package results differ. It is presumably induced by distinct math formulas used in reference paper (Jankowska et al. 2023) and in TDA::kde() function. In the paper formula (Silverman 1986) h is defined as size of kernel (Jankowska et al. 2023).

P(x,y) = \frac{1}{nh^2} \sum^n_{i=1} K(\begin{array} {c} d_i(x,y) \\ h \end{array})

In package formula (Wasserman 2004) h is a smoothing parameter which is additionally squared.

pX(x) = \frac{1}{n(\sqrt{2\pi} h)^d} \sum_{i=1} ^ n \exp(\frac{- \Vert x - X_i \Vert_2^2}{2h^2})

Following the advice of the author of Density Ranking function, Yen-Chi Chen, smoothing parameter of twsagps::KDE_exposure() and code previously discussed is square rooted in order to compensate for varying formulas. h parameter modifications do not nullify spatial inconsistencies. Comparison of TDA::kde() kernel search radius and adjusted dissolved buffer presents spatial inaccuracy of the function.

TDA KDE with √200, √100 and √25 smoothing parameters with corresponding dissolved buffers

For the purpose of comparison alternative kde function from SpatialKDE package has been utilized in twsagps::KDE_slow(). Function’s outputs are suitable and kernel search radius is operating as intented but running time is too excessive to substitute the TDA::kde(). Comparing kde() functions remarkably low values of TDA and spatial differences are noticeable.

spatial_kde = SpatialKDE::kde(sf::st_as_sf(geol_proj), band_width = bandwidth,
                                  grid = raster::raster(grid_rast))
spatial_kde = terra::rast(spatial_kde) 

TDA and SpatialKDE - activity space

TDA and SpatialKDE - normalized NDVI exposure

TDA and SpatialKDE - NDVI exposure statistics
Package count area min max range mean std sum
TDA 18551 46413985 -0.0017587 0.0537845 0.0555432 0.0001080 0.0013325 2.003008
SpatialKDE 19420 48588191 -0.0070229 0.1343125 0.1413355 0.0014399 0.0037505 27.963379

DR function from density-ranking utilizes Empirical Cumulative Distribution Function on GPS data coordinates’ KDE and all cells coordinates’ KDE.

coords = terra::geom(geol_proj)[,3:4]
expand_grid = expand.grid(x_seq,y_seq)

D_kde = TDA::kde(X = coords, Grid = coords, h = bandwidth_dr)
grid_kde = TDA::kde(X = coords, Grid = expand_grid, h = bandwidth_dr)

DR_end = ecdf(D_kde)(grid_kde)

Due to processing of data values abnormalities are partially mitigated and smoothing parameter is modified. As h parameter is defined in units of coordinates, bandwidths in degrees to corresponding search radiuses in meters (0.001 - 400 m, 0.007 - 200 m, 0.005 - 100 m). These may serve as a reference in evaluating smoothing parameter values in meters.

Comparison of DR with various smoothing parameters.

With the intention of using the bandwidth in meters in KDE and DR twsagps functions, method of transforming h parameter from distance unit to h applied in TDA::kde() math formula should be derived.

Further development

Implementing extent parameter that enables specifying research area would grant increased control over formation of grid raster and allow removal of incorrectly located data. Functions should handle sf bbox and terra SpatExtent class objects as well as vector data of 4 length (xmin, xmax, ymin, ymax).

Package should process vector data as their environmental data input as it would enable usage of sf and terra SpatVector objects. Additionally, user should have the opportunity to generate buffer around the vector data to embody spatial influence of enviromental objects.

User ought to be avaible to insert spatial data in form different from data frame. Implementing handling spatial class object of terra SpatVector and sf points would broaden variety of accesible data.

Applying iteration over data groups would allow to derive distinct seperate IDs, types of mobility and dates in seperate outputs.

References

Chen, Yen-Chi. 2019. “Generalized Cluster Trees and Singular Measures.” The Annals of Statistics 47 (4). https://doi.org/10.1214/18-aos1744.
Chen, Yen-Chi, and Adrian Dobra. 2020. “Measuring Human Activity Spaces from GPS Data with Density Ranking and Summary Curves.” The Annals of Applied Statistics 14 (1). https://doi.org/10.1214/19-aoas1311.
Jankowska, Marta M., Jiue-An Yang, Nana Luo, Chad Spoon, and Tarik Benmarhnia. 2023. “Accounting for Space, Time, and Behavior Using GPS Derived Dynamic Measures of Environmental Exposure.” Health & Place 79 (January): 102706. https://doi.org/10.1016/j.healthplace.2021.102706.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Vol. 37. 1. Chapman Hall. http://books.google.com/books?hl=en&lr=&id=e-xsrjsL7WkC&oi=fnd&pg=PR9&dq=Density+Estimation+for+Statistics+and+Data+Analysis&ots=ivSonp7D_q&sig=XfZFlEyzmSO4nm54dgq22EiW9iA.
Wasserman, Larry. 2004. All of Statistics. Springer New York. https://doi.org/10.1007/978-0-387-21736-9.
Yu Zheng, Wei-Ying Ma, Xing Xie. 2010. “GeoLife: A Collaborative Social Networking Service Among User, Location and Trajectory.” IEEE Data Engineering Bulletin 33 (2): 32–44. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/GeoLife-A20Collaborative20Social20Networking20Service20among20User2C20Location20and20Trajectory.pdf.
Zheng, Yu, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. “Understanding Mobility Based on GPS Data.” Proceedings of the 10th International Conference on Ubiquitous Computing, September. https://doi.org/10.1145/1409635.1409677.
Zheng, Yu, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. “Mining Interesting Locations and Travel Sequences from GPS Trajectories.” Proceedings of the 18th International Conference on World Wide Web, April. https://doi.org/10.1145/1526709.1526816.